大多数用于边缘计算的强化学习(RL)推荐系统必须在推荐选择期间同步,或者依赖于算法的未经警告拼凑集合。在这项工作中,我们构建了异步凝固策略梯度算法\ citep {kostas2020aSynchronchronous},为此问题提出了一个原则的解决方案。我们提出的算法类可以通过Internet分发,并实时地运行。当给定边缘无法响应具有足够速度的数据请求时,这不是问题;该算法旨在在边缘设置中函数和学习,网络问题是此设置的一部分。结果是一个原则性的理论地接地的RL算法,旨在分布在该异步环境中并学习。在这项工作中,我们详细描述了这种算法和建议的架构类,并且证明它们在异步设置中的实践中运行良好,即使网络质量降低。
translated by 谷歌翻译
There has been great recent advancement in human-computer chat. However, proper evaluation currently requires human judgements that produce notoriously high-variance metrics due to their inherent subjectivity. Furthermore, there is little standardization in the methods and labels used for evaluation, with an overall lack of work to compare and assess the validity of various evaluation approaches. As a consequence, existing evaluation results likely leave an incomplete picture of the strengths and weaknesses of open-domain chatbots. We aim towards a dimensional evaluation of human-computer chat that can reliably measure several distinct aspects of chat quality. To this end, we present our novel human evaluation method that quantifies the rate of several quality-related chatbot behaviors. Our results demonstrate our method to be more suitable for dimensional chat evaluation than alternative likert-style or comparative methods. We then use our validated method and existing methods to evaluate four open-domain chat models from the recent literature.
translated by 谷歌翻译
Developments in autonomous vehicles (AVs) are rapidly advancing and will in the next 20 years become a central part to our society. However, especially in the early stages of deployment, there is expected to be incidents involving AVs. In the event of AV incidents, decisions will need to be made that require ethical decisions, e.g., deciding between colliding into a group of pedestrians or a rigid barrier. For an AV to undertake such ethical decision making and path planning, simulation models of the situation will be required that are used in real-time on-board the AV. These models will enable path planning and ethical decision making to be undertaken based on predetermined collision injury severity levels. In this research, models are developed for the path planning and ethical decision making that predetermine knowledge regarding the possible collision injury severities, i.e., peak deformation of the AV colliding into the rigid barrier or the impact velocity of the AV colliding into a pedestrian. Based on such knowledge and using fuzzy logic, a novel nonlinear weighted utility cost function for the collision injury severity levels is developed. This allows the model-based predicted collision outcomes arising from AV peak deformation and AV-pedestrian impact velocity to be examined separately via weighted utility cost functions with a common structure. The general form of the weighted utility cost function exploits a fuzzy sets approach, thus allowing common utility costs from the two separate utility cost functions to be meaningfully compared. A decision-making algorithm, which makes use of a utilitarian ethical approach, ensures that the AV will always steer onto the path which represents the lowest injury severity level, hence utility cost to society.
translated by 谷歌翻译
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
translated by 谷歌翻译
In recent years, deep learning has infiltrated every field it has touched, reducing the need for specialist knowledge and automating the process of knowledge discovery from data. This review argues that astronomy is no different, and that we are currently in the midst of a deep learning revolution that is transforming the way we do astronomy. We trace the history of astronomical connectionism from the early days of multilayer perceptrons, through the second wave of convolutional and recurrent neural networks, to the current third wave of self-supervised and unsupervised deep learning. We then predict that we will soon enter a fourth wave of astronomical connectionism, in which finetuned versions of an all-encompassing 'foundation' model will replace expertly crafted deep learning models. We argue that such a model can only be brought about through a symbiotic relationship between astronomy and connectionism, whereby astronomy provides high quality multimodal data to train the foundation model, and in turn the foundation model is used to advance astronomical research.
translated by 谷歌翻译
通用数据模型解决了标准化电子健康记录(EHR)数据的许多挑战,但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论(OBO)铸造本体论提供了可用于生物学知识的语义计算表示,并能够整合多种生物医学数据。但是,将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架,用于将观察性医学成果合作伙伴关系(OMOP)标准词汇介绍给OBO铸造本体。使用此框架,我们制作了92,367条条件,8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性,并且在24家医院进行检查时,映射覆盖了99%的条件和药物成分和68%的测量结果。最后,我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。
translated by 谷歌翻译
预测经济的短期动态 - 对经济代理商决策过程的重要意见 - 经常在线性模型中使用滞后指标。这通常在正常时期就足够了,但在危机期间可能不足。本文旨在证明,在非线性机器学习方法的帮助下,非传统和及时的数据(例如零售和批发付款)可以为决策者提供复杂的模型,以准确地估算几乎实时的关键宏观经济指标。此外,我们提供了一组计量经济学工具,以减轻机器学习模型中的过度拟合和解释性挑战,以提高其政策使用的有效性。我们的模型具有付款数据,非线性方法和量身定制的交叉验证方法,有助于提高宏观经济的启示准确性高达40 \% - 在COVID-19期间的增长较高。我们观察到,付款数据对经济预测的贡献很小,在低和正常增长期间是线性的。但是,在强年或正增长期间,付款数据的贡献很大,不对称和非线性。
translated by 谷歌翻译
在线自主代理能够利用各种潜在的任务知识来源;但是,目前的方法总是只关注一两个。在这里,我们调查了利用多样化知识源以一记模拟的家用移动机器人的新任务学习的挑战和影响。在SOAR认知体系结构中开发的最终代理使用以下域和任务知识来源:与环境的互动,任务执行和规划知识,人类自然语言指导以及从大语言模型(GPT-3)检索到的响应。我们探讨了这些知识来源的不同贡献,并在学习正确的任务知识,人力工作量和计算成本方面评估了不同组合的性能。结合所有来源的结果表明,整合可以在计算成本和人力工作量方面改善一声任务学习。
translated by 谷歌翻译
本文通过讨论参加了为期三年的SubT竞赛的六支球队的不同大满贯策略和成果,报道了地下大满贯的现状。特别是,本文有四个主要目标。首先,我们审查团队采用的算法,架构和系统;特别重点是以激光雷达以激光雷达为中心的SLAM解决方案(几乎所有竞争中所有团队的首选方法),异质的多机器人操作(包括空中机器人和地面机器人)和现实世界的地下操作(从存在需要处理严格的计算约束的晦涩之处)。我们不会回避讨论不同SubT SLAM系统背后的肮脏细节,这些系统通常会从技术论文中省略。其次,我们通过强调当前的SLAM系统的可能性以及我们认为与一些良好的系统工程有关的范围来讨论该领域的成熟度。第三,我们概述了我们认为是基本的开放问题,这些问题可能需要进一步的研究才能突破。最后,我们提供了在SubT挑战和相关工作期间生产的开源SLAM实现和数据集的列表,并构成了研究人员和从业人员的有用资源。
translated by 谷歌翻译
炎症性肠病(IBD),尤其是溃疡性结肠炎(UC),由内镜医生分级,该评估是风险分层和治疗监测的基础。目前,内窥镜表征在很大程度上取决于操作员,导致IBD患者有时不良的临床结果。我们专注于广泛使用但需要可靠地鉴定粘膜炎症变化的蛋黄酱内窥镜评分(MES)系统。大多数现有的深度学习分类方法无法检测到这些细粒度的变化,从而使UC的分级成为一项具有挑战性的任务。在这项工作中,我们介绍了一个新颖的贴片级实例组歧视,并使用借口 - 不变的表示学习(PLD-pirl)进行自我监督学习(SSL)。我们的实验表明,与基线监督网络和几种最先进的SSL方法相比,准确性和鲁棒性提高了。与基线(RESNET50)监督分类相比,我们提出的PLD-pirl在Hold-Out测试数据中获得了4.75%的改善,而在看不见的中心测试数据中获得了6.64%的速度,以获得TOP-1的准确性。
translated by 谷歌翻译